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We consider an alternative approach to the foundations of statistical mechanics, in which subjec- 
tive randomness, ensemble-averaging or time-averaging are not required. Instead, the universe (i.e. 
the system together with a sufficiently large environment) is in a quantum pure state subject to a 
global constraint, and thermalisation results from entanglement between system and environment. 
We formulate and prove a "General Canonical Principle" , which states that the system will be ther- 
malised for almost all pure states of the universe, and provide rigorous quantitative bounds using 
Levy's Lemma. 



I. INTRODUCTION 

Despite many years of research, the foundations of sta- 
tistical mechanics remain a controversial subject. Crucial 
questions regarding the role of probabilities and entropy 
(which are viewed both as measures of ignorance and ob- 
jective properties of the state) are not satisfactorily re- 
solved, and the relevance of time averages and ensemble 
averages to individual physical systems is unclear. 

Here we adopt a fundamentally new viewpoint sug- 
gested by Yakir Aharonov [1], which is uniquely quan- 
tum, and which does not rely on any ignorance proba- 
bilities in the description of the state. We consider the 
global state of a large isolated system, the 'universe', to 
be a quantum pure state. Hence there is no lack of knowl- 
edge about the state of the universe, and the entropy of 
the universe is zero. However, when we consider only 
part of the universe (that we call the 'system'), it is pos- 
sible that its state will not be pure, due to quantum en- 
tanglement with the rest of the universe (that we call 
the 'environment'). Hence there is an objective 'lack of 
knowledge' about the state of the system, even though we 
know everything about the state of the universe. In such 
cases, the entropy of the system is non-zero, even though 
we have introduced no randomness and the universe itself 
has zero entropy. 

Furthermore, interactions between the system and en- 
vironment can objectively increase both the entropy of 
the system and that of the environment by increasing 
their entanglement. It is conceivable that this is the 
mechanism behind the second law of thermodynamics. 
Indeed, as information about the system will tend to leak 
into (and spread out in) the environment, we might well 
expect that their entanglement (and hence entropy) will 
increase over time in accordance with the second law. 

The above ideas provide a compelling vision of the 
foundations of statistical mechanics. Such a viewpoint 
has been independently proposed recently by Gemmer et 
al. [2]. 

In this paper, we address one particular aspect of the 
above programme. We show that thermalisation is a 
generic property of pure states of the universe, in the 



sense that for almost all of them, the reduced state of the 
system is the canonical mixed state. That is, not only is 
the state of the system mixed (due to entanglement with 
the rest of the universe), but it is in precisely the state 
we would expect from standard statistical arguments. 

In fact, we prove a stronger result. In the standard 
statistical setting, energy constraints are imposed on the 
state of the universe, which then determine a correspond- 
ing temperature and canonical state for the system. Here 
we consider that states of the universe are subject to 
arbitrary constraints. We then show that almost every 
pure state of the universe subject to those constraints is 
such that the system is in the corresponding generalised 
canonical state. 

Our results are kinematic, rather than dynamical. 
That is, we do not consider any particular unitary evolu- 
tion of the global state, and we do not show that thermal- 
isation of the system occurs. However, because almost 
all states of the universe are such that the system is in a 
canonical thermal state, we anticipate that most evolu- 
tions will quickly carry a state in which the system is not 
thermalised to one in which it is, and that the system 
will remain thermalised for most of its evolution. 

A key ingredient in our analysis is Levy's Lemma [3, 4], 
which plays a similar role to the law of large numbers 
and governs the properties of typical states in large- 
dimensional Hilbert spaces. Levy's Lemma has already 
been used in quantum information theory to study en- 
tanglement and other correlation properties of random 
states in large bipartite systems [5]. It provides a very 
powerful tool with which to evaluate functions of ran- 
domly chosen quantum states. 

The structure of this paper is as follows. In section 
II we present our main result in the form of a General 
Canonical Principle. In section HI we support this prin- 
ciple with precise mathematical theorems. In section IV 
we introduce Levy's Lemma, which is used in sections 
V and VI to provide proofs of our main theorems. Sec- 
tion VII illustrates these results with the simple example 
of spins in a magnetic field. Finally, in section VIII we 
present our conclusions. 
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II. GENERAL CANONICAL PRINCIPLE 

Consider a large quantum mechanical system, 'the uni- 
verse', that wc decompose into two parts, the 'system' S 
and the 'environment' E. We will assume that the di- 
mension of the environment is much larger than that of 
the system. Consider now that the state of the universe 
obeys some global constraint R. We can represent this 
quantum mechanically by restricting the allowed states 
of the system and environment to a subspace Hr of the 
total Hilbert space: 

UrQHs® He, (1) 

where Hs and He are the Hilbert spaces of the system 
and environment, with dimensions ds and cIe respec- 
tively. In standard statistical mechanics R would typ- 
ically be a restriction on the total energy of the universe, 
but here we leave R completely general. 

Wc define £r, the equiprobable state of the uni- 
verse corresponding to the restriction R, by 



where 1r is the identity (projection) operator on Hr, 
and dR is the dimension of Hr. £r is the maximally 
mixed state in Hr, in which each pure state has equal 
probability. This corresponds to the standard intuition 
of assigning equal a priori probabilities to all states of 
the universe consistent with the constraints. 

We define fig, the canonical state of the system 
corresponding to the restriction R, as the quantum 
state of the system when the universe is in the equiprob- 
able state £r. The canonical state of the system O5 is 
therefore obtained by tracing out the environment in the 
equiprobable state of the universe: 

ns = Tie{£r). (3) 

Wc now come to the main idea behind our paper. 

As described in the introduction, we now consider that 
the universe is in a pure state (f), and not in the mixed 
state £r (which represents a subjective lack of knowledge 
about its state). We prove that despite this, the state 
of the system is very close to the canonical state fig in 
almost all cases. That is, for almost every pure state of 
the universe, the system behaves as if the universe were 
actually in the equiprobable mixed state £r. 

We now state this basic qualitative result as a general 
principle, that will subsequently be refined by quantita- 
tive theorems: 

General Canonical Principle: Given a sufficiently 
small subsystem of the universe, almost every pure state 
of the universe is such that the subsystem is approxi- 
mately in the canonical state fig. 

Recalling that the canonical state of the system Vis is, 
by definition, the state of the system when the universe is 



in the equiprobable state £r we can interpret the above 
principle as follows: 

Principle of Apparently Equal a priori Proba- 
bility: For almost every pure state of the universe, the 
state of a sufficiently small subsystem is approximately 
the same as if the universe were in the equiprobable state 
£r. In other words, almost every pure state of the uni- 
verse is locally (i.e. on the system) indistinguishable from 

£r. 

For an arbitrary pure state |(^) of the universe, the 
state of the system alone is given by 

PS = TV^;(|<^)(,^|). (4) 

Our principle states that for almost all states e Hr, 

PS « ^s- (5) 

Obviously, the above principle is stated qualitatively. 
To express these results quantitatively, we need to care- 
fully define what we mean by a sufficiently small subsys- 
tem, under what distance measure ps « ^s, and how 
good this approximation is. This will be done in the 
remaining sections of the paper. 

We emphasise that the above is a generalised principle, 
in the sense that the restriction R imposed on the states 
of the universe is completely arbitrary (and is not neces- 
sarily the usual constraint on energy or other conserved 
quantities). Similarly, the canonical state ils is not nec- 
essarily the usual thermal canonical state, but is defined 
relative to the arbitrary restriction R by equation (3). 

To connect the above principle to standard statistical 
mechanics, all we have to do is to consider the restriction 
R to be that the total energy of the universe is close to 
E, which then sets the temperature scale T. The total 
Hamiltonian of the universe Hu is given by 

Hu = Hs + He + -ffmt , (6) 

where Hs and He are the Hamiltonians of the system 
and environment respectively, and Hint is the interaction 
Hamiltonian between the system and environment. In 
the standard situation, in which Hint is small and the 
energy spectrum of the environment is sufficiently dense 
and uniform, the canonical state Qg can be computed 
using standard techniques, and shown to be 

0f)cxexp(^-j^). (7) 

This allows us to state the thermal canonical principle 
that establishes the validity (at least kinematically) of 
the viewpoint expressed in the introduction. 

Thermal Canonical Principle: Given that the total 
energy of the universe is approximately E, interactions 
between the system and the rest of the universe are weak, 

and that the energy spectrum of the universe is suffi- 
ciently dense and uniform, almost every pure state of the 
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universe is such that the state of the system alone is ap- 

proximately equal to the thermal canonical state e~''B^, 
with temperature T ( corresponding to the energy E ) 

Wc emphasise here that our contribution in this paper 
is to show that ps ~ f^s, and has nothing to do with 

showing that fig oc e ''s^ , which is a standard problem 
in statistical mechanics. 

Finally, we note that the General Canonical Principle 
applies also in the case where the interaction between 
the system and environment is not small. In such sit- 
uations, the canonical state of the system is no longer 

e ^ since the behaviour of the system will depend 
very strongly on Hint- Nevertheless, the general principle 
remains valid for the corresponding generalised canonical 
state Q.S- Furthermore our principle will apply to arbi- 
trary restrictions R that have nothing to do with energy, 
which may lead to many interesting insights. 



III. QUANTITATIVE SETUP 
AND MAIN THEOREMS 

We now formulate and prove precise mathematical the- 
orems correponding to the General Canonical Principle 
stated in the previous section. 

As a measure of the distance between ps and Og, we 
use the trace-norm — fisH^, where [6] 



where 



||M||i = Tr|M| = TrVMtM, 



(8) 



as this distance will be small if and only if it would be 
hard for any measurement to tell ps and il,s apart. In- 
deed, II Mill — sup||(3||<]^ Tr(AfO), where the maximisa- 
tion is over all operators (observables) O with operator 
norm bounded by 1. 

In our analysis, we al so make u se of the Hilbcrt- 
Schmidt norm IIMII2 = VTr(MtM), which is easier to 
manipulate than ||M||i. However, we only use this for 
intermediate calculational purposes, as it does not have 
the desirable physical properties of the trace-norm. In 
particular Hps ^ ^sll2 '^^^ small even when the two 
states are orthogonal for high-dimensional systems. 

Throughout this paper we denote by (•) the average 
over states \cf)) G Ti;^ according to the uniform distribu- 
tion. For example, it is easy to see that fig = {ps)- 

We will prove the following theorems: 

Theorem 1 For a randomly chosen state |(/>) e Hr C 
Hs and arbitrary e > 0, the distance between the 

reduced density matrix of the system ps = Tr(|0)((T!)|) and 
the canonical state fls = Tt:£r is given probabilistically 
by 




T]' = 2exp(-Cdfle2) 



(10) 
(11) 



In these expressions, C is a positive constant (given by 
C = {18'K^)~^), ds and dn are the dimensions of lis and 
Hr respectively, andd^ is a measure of the effective size 
of the environment, given by 



df = 



Trf2| 



> 



dR 
ds' 



(12) 



where He ^ TtsSr. Both rj and rj' will be small quanti- 
ties, and thus the state will be close to the canonical state 
with high probability, whenever d^ ^ ds (i-e. the effec- 
tive dimension of the environment is much larger than 
that of the system) and d^e^ ^ 1 ^ e. This latter condi- 
tion can be ensured when dR ^ 1 (i.e. the total accessible 
space is large), by choosing e = d'^^^ . 

This theorem gives rigorous meaning to our statements 
in section II about thermalisation being achieved for 'al- 
most air states: we have an exponentially small bound 
on the relative volume of the exceptional set, i.e. on 
the probability of finding the system in a state that is 
far from the canonical state. Interestingly, the exponent 
scales with the dimension of the space T-Ir of the con- 
straint, while the deviation from the canonical state is 
characterised by the ratio between the system size and 
the effective size of the environment, which makes intu- 
itive sense. 

Theorem 1 provides a bound on the distance between 
PS and Q.Si but in many situations we can further im- 
prove it. Often the system does not really occupy all of 
its Hilbcrt space Jis-, and also the estimate of the effec- 
tive environment dimension d'j^ may be too small, due 
to exceptionally large eigenvalues of = TvsEr. By 
cutting out these non-typical components (similar to the 
well-known method of projecting onto the typical sub- 
space), we can optimize the bound obtained, as we will 
show in Theorem 2. The benefits of this optimization 
will be apparent in section VII, where we consider a par- 
ticular example. 

Theorem 2 Assume that there exists some bounded pos- 
itive operator Xr on Hn satisfying < Xr < 1 such 
that, with £r = ^/ XrEr^/ Xr, 



1:t{£r) = Ti{£rXr) >1-6. 



(13) 



(I.e. the probability of obtaining the outcome correspond- 
ing to measurement operator Xr in a generalised mea- 
surement (POVM) on £r is approximately one.) 

Then, for a randomly chosen state |^) € Hr C Hs ® 
He and e > 0, 



Pmh[\\ps-ns\\,>v] <V', 



(9) 



Proh[\\ps-ns\\^>fl] <fi' 



(14) 
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where 




2exp(-Cdfle2). 



(15) 
(16) 



Here, C and dn are as in Theorem 1, ds is the dimension 
of the support of Xn in Hs, and is the effective size 
of the environment after applying Xr, given by 

1 



jeff 



TrQ| 



dR 



(17) 



where D,e = Tr s{£r)- In many situations S can be made 
very small, while at the same time improving the rela- 
tion between system and effective environment dimen- 
sion. Note that the above is essentially the technique of 
the smooth (quantum) Renyi entropies [7, 8J: logds is 
related to S^iQs) and log J|? to ^KOe). 

In the process of proving these theorems, we also ob- 
tain the following subsidiary results: 

1. The average distance between the system's reduced 
density matrix for a randomly chosen state and the 
canonical state will be small whenever the effective 
environment size is larger than the system. Specif- 
ically, 



\\ps-W< 




(18) 



where the effective dimension of the environment 
d|? is given by (12). 

2. With high probabihty, the expectation value of a 

bounded observable Os on the system for a ran- 
domly chosen state will be very similar to its ex- 
pectation value in the canonical state whenever 
dij » 1. Specifically, 



Prob[|Tr(OsPs) - TriOsfts)] > d] 



-1/3] 



< 2exp 



Cd 



1/3 ■ 



(19) 



iiosir 



where C is a constant. 



In our analysis we use two alternative methods, with 
the hope that the different mathematical techniques em- 
ployed will aid in future exploration of the field. 



IV. LEVY'S LEMMA 

A major component in the proofs of the following 
sections is the mathematical theorem known as Levy's 
Lemma [3, 4], which states that when a point is se- 
lected at random from a hypersphere of high dimension 
and /((/)) does not vary too rapidly, then f{(j)) w (/) with 
high probability: 



Lemma 3 (Levy's Lemma) Given a function / : S'^ - 
M defined on the d-dimensional hypersphere S'^, and 
point ^ gS'^ chosen uniformly at random, 



Prob[ !/(</.)- (/)| >e] <2exp 



'2C{d+l)e^ 



(20) 



where rj is the Lipschitz constant of f , given by rj = 
sup |V/|, andC is a positive constant (which can be taken 
tobeC = (IStt^)-!;. 

Due to normalisation, pure states in Tin can be repre- 
sented by points on the surface of a (2rfi{ — l)-dimensional 
hypersphere S^"^""^, and hence we can apply Levy's 
Lemma to functions of the randomly selected quantum 
state (f) by setting d — 2dji — 1 . For such a randomly cho- 
sen state 1^) e Hr, we wish to show that \\ps — OsH^ « 
with high probability. 



V. METHOD I: APPLYING LEVY'S 
LEMMA TO Hps -nslli 

In this section, we consider the consequences of ap- 
plying Levy's Lemma directly to the distance between 
PS = Tr_E(|<^)(0|) and fls, by choosing 



fi<l>) = \\ps-nsh- 



(21) 



in (20). As we prove in appendix A, the function /(</>) 
has Lipschitz constant t] < 2. Applying Levy's Lemma 
to f{(p) then gives: 



Prob 



\\ps-ns\\,-{\\ps-ns\\,) 



> e 



< 2e 



(22) 

To obtain Theorem 1, we rearrange this equation to get 
Pvoh[\\ps-ns\\^>v]<v' (23) 



where 



V = e+{\\ps-^ls\\i) 
T]' = 2exp{-CdRe^). 



(24) 
(25) 



The focus of the following subsections is to obtain a 
bound on (||p5 — 0^11^). In section VC we show that 



(||p5-f^s||i)< 



Id^ 
df 



(26) 



where d"^ is a measure of the effective size of the envi- 
ronment, given by (12). Inserting equation (26) in (24) 
we obtain Theorem 1. 

Typically rfij » 1 (the total number of accessible states 

— 1/3 

is large) and hence by choosing e = dj^ we can ensure 
that both e and rj' are small quantities. When it is also 
true that dj? ds (the environment is much larger than 
that of the system) both rj and rj' will be small quantities, 
leading to \\ps — ^s\\i ^ with high probability. 



5 



To obtain Theorem 2. we consider a generalised mea- 
surement which has an almost certain outcome for the 
equiprobable state £r € Hn, and apply the correspond- 
ing measurement operator before proceeding with our 
analysis. By an appropriate choice of measurement oper- 
ator, the ratio of the system and environment's effective 
dimensions can be significantly improved (as shown by 
the example in section VII A). 



A. Calculating {\\ps - f2s||i) 

As mentioned in section III, although \\ps — fls\\i is 
a physically meaningful quantity, it is difficult to work 

with directly, so we first relate it to the Hilbert-Schmidt 
norm \\ps — ^^s||2- The two norms arc related by 



\\ps -^s\\i< Vds \\PS - ^sh > 

as proved in Appendix A. 

Expanding Upg — ri5||2 we obtain 



(27) 



Wps-W < y(llPs-fJs|l2) (28) 

= V(TV(pg - Og)2) 

= ^(TVp|)-2TV((ps)^^s)+Trf^| 

(29) 

and hence 



= ^(Trp|)-Trl7|, 
\ps-^lsh)<^Jds{{Tvpl}-Tvnl) (30) 



B. Calculating {Tr(p|)) 
In this section we show the fundamental inequality 



{Trpl)<Tr{psf + Tv{pEf 



(31) 



The following calculations and estimates are closely re- 
lated to the arguments used in random quantum chan- 
nel coding [9] and random entanglement distillation (see 
[10]). 

To calculate (Tr p^), it is helpful to introduce a second 
copy of the original Hilbert space, extending the problem 
from Hr to Hr Hr' where Hri C Hs' ^He'- 

Note that 

k 

= {pki){Pk'i'){kk'\ll'){l'l\kk') 

k,l,k',l' 

= Trss'{ips®PS')Fss'), (32) 
where Fss' is the flip (or swap) operation S' S": 

Fss' = Y.\^')i^\s'^\^)i^'\s', (33) 

S,S' 



and hence 

Trspl = TrRR,{{\cj>){<p\(^\<P){cP\)RR,{Fss' Ibb'))- (34) 
So, our problem reduces to the calculation of 



V 



d(/-. (35) 



As V is invariant under operations of the form V 
{U U)V{U^ (g) U^) for any unitary U, representation 
theory implies that 



y = anf™+/3n-«. 



(36) 



where 11^]^,^™*' are projectors onto the symmetric and 
antisymmetric subspaces of Hr (8) Hr' respectively, and 
a and (3 are constants. As 



V2 



\ab)-\ba))=0 \/a,b,(f>, (37) 



it is clear that (3 = 0, and as V is a normalised state 
1 2 



a ■ 



dim{RR') dR{dR + l)' 



(38) 



Hence 



and therefore 



dR{dR + iy 



Kr'- (39) 



(Trspl) = Ttrr, U-^-^j^^ ] (Fss' Ibb' 



jRidR + 1) 



To proceed further wc perform the substitution 



Kr, = ^{1rH'+Frr,), 



(40) 



(41) 



where Frr' is the flip operator taking R R' . Noting 
that Frr, = 1rr'{Fss' Fee'), this gives 



(TtspI) = Ttrr ' ' 



{FsS' O tEE') 



dnidn + 1 

^■^-'((*STT))<'-«^'""' 

{FsS'^^EE') 



, dR dR 
+ T^RR' ((^ ® ^) (Iss' ® Fee') 

= Tiss'i{^s<^^s)Fss') 

+ Tj:ee'{{^e^^e)Fee'). 

Hence from equation (32), 

{Tr sip s)) <T^^S^S + T^^Enl 
= Tr (ps)^ -I- Tr(p£;)^ 



(42) 



(43) 
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C. Bounding (Hps- fis 111) 

Inserting the results of the last section in equation (30) 
we obtain 



\\ps-^s\\i) < \ ds Tr 



(44) 



Intuitively, we can understand this equation by defin- 
ing 



(45) 



as the effective dimension of the environment in the 

canonical state. If all of the non-zero eigenvalues oi ^Ie 
were of equal weight this would simply correspond to the 
dimension of il^'s support, but more generally it will 
measure the dimension of the space in which the envi- 
ronment is most likely to be found. When there is no 
constraint on the accessible states of the environment, 
such that Hiz = H'g <E> H-e then = 

Denoting the eigenvalues of by X% (with maximum 
eigenvalue A^"^), it is also interesting to note that 



< 



= max(^^|Trs^-j |^^) 
= max$^(sVi5|^kVi^) 



< 



ds 



(46) 



Hence > da/ds, and we obtain the final result that 



WPS 




(47) 



The average distance (Upg — figH will therefore be 
small whenever the effective size of the environment is 
much larger than that of the system ^ ds)- 

Inserting the results of equation (47) into equation (24) 
gives Theorem 1. 



(by eliminating components of fi^; with disproportion- 
ately high amplitudes), whilst leaving the equiprobable 
state £n largely unchanged. 

To allow for the most general possibility, we consider 
a generalised measurement operator Xn satisfying < 
< 1 (of which a projector is a special case), which 
has high probability of being satisfied by Sr, such that 



Trn{£nXn) > 1 - S. 



(48) 



We denote the dimension of the support of Xr in Hs 
by ds, which will play the role of ds in the revised anal- 
ysis [11]. The bounds on (Hp^ — ris||^) will be optimized 
by choosing Xr such that ds is as small as possible, and 
d^ as large as possible. 

We also define the sub-normalised states obtained after 
measurement of Xr: 




Xr 

dR 



= TrE{£R) 
= Trs(ffl) 



(49) 
(50) 

(51) 
(52) 



Applying the same analysis as in the previous sections 
to these states, we find 



= Tr 



RR' 



(x^^Xfi)n -^"' 

dR{dR + 

Xr ^ Xr 



{FsS' O tEE')) 

^(Fss' tEE') 



+ TrRR,((^^^ 



\dR " dR 
TVs OI + IYeQI, 



t-EE') 

{Iss'^Fee') 



(53) 



where in the second equality we have used the fact that 
I^/Xr, (El \fXRi, n^^] = 0. From the analogue of equa- 
tion (30) we can then obtain 



< 



PS - 

where (using the analogue of (46)) 



d^ 



(54) 



D. Improved bounds using restricted subspaces 

As mentioned in section III, in many cases it is possi- 
ble to improve the bounds obtained from Theorem 1 by 

projecting the states onto a typical subspace before pro- 
ceeding with the analysis. This can allow one to decrease 
the effective dimension of the system ds (by eliminating 
components with negligible amplitude) , and increase the 
effective dimension of the environment d|? = {Tr^^^)~^ 



df = 



To transform this bound on 
on ||p5 — Osllj, we note that 



Trjs n| ds 

PS - ^s 



\\ps-ns\\,<\\ps-ps\\. 



ns-ns 



(55) 



into a bound 



+ 



PS - 



(56) 
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We bound \\ps — Ps\\i as follows: 



\\Ps - ps\ 



< 



< V2 



'2Tl'f|r))(o| - |o)(o|)2 

= ^J2{l-2{<f>\^/X^\<f>)^ + {<l>\XR\<f>)^) 

< v /2(l - {<P\Xnm 

< ^/Ail-TriXnimi)), (57) 

where in the first inequality we have used the non- 
increase of the trace-norm under partial tracing, in the 
second inequality we have used Lemma 6 (Appendix A) 
and the fact that and |^) span a two-dimensional sub- 
space, and in the third inequality we have used the fact 
that Xu < \/Xr (because Xr < 
It follows that 



< v ^(4(l - TTjXnm m 
= ^4(1 - Ti{Xr£r) 

< 2y/6, (58) 

where we have used the concavity of the square root func- 
tion and equation (48). 

In addition, note that from the triangle inequality, 



= \\{ps-ps)\\i 

< {\\ps-ps\\i) 

< 2\/S. 



(59) 



Inserting these results into the average of equation (56) 
we get 



{\\ps-i^sV<Jj^ + ^V~6 



(60) 



and inserting this in equation (24) we obtain Theorem 2. 



the canonical state fig is small with high probability. We 
then proceed to show that this holds for a full operator 
basis, and thereby prove that ps ~ with high proba- 
bility when dj^ ^ 

In this method. Levy's Lemma plays a far more cen- 
tral role. This approach may be more suitable in some 
situations, and yields further insights into the underlying 
structure of the problem. 



A. Similarity of expectation values for random and 
canonical states 

Consider Levy's Lemma applied to the expectation 
value of an operator Os on Hs, for which we take 



m = Tv{OsPs). 



(61) 



in (20). Let Os have bounded operator norm \\0s\\ 
(where ||0s|| is the modulus of the maximum eigenvalue 
of the operator). Then the Lipschitz constant of /(</>) 
is also bounded, satisfying r] < 2 \\Os\\ (as shown in ap- 
pendix A). We therefore obtain 

Pvoh[\Tr{OsPs) - {Tr{0sps))\ > e] < 2exp (-^) • 

(62) 

However, note that 

{Tr{0sps)) = Tr{Os{ps)) = Tr(OsOs), (63) 
and hence that 

Prob[ \Tr{Osps) - MOs^s)\ > e] < 2exp (-pfp) • 

(64) 

— 1/3 

By choosing e = ' we obtain the result that 



Prob[|Tr(OsPs)-Tr(Osfis)| > ^7'/'^ 



(65) 



For rffl » 1, the expectation value of any given bounded 

operator for a randomly chosen state will therefore be 
close to that of the canonical state with high proba- 
bility [13]. 



VI. METHOD II: APPLYING LEVY'S 
LEMMA TO EXPECTATION VALUES 

In this section, we describe an alternative method of 
obtaining bounds on ||ps — ^s\\i by considering the ex- 
pectation values of a complete set of observables. The 
physical intuition is that if the expectations of all ob- 
servables on two states are close to each other, then the 
states themselves must be close. 

We begin by showing that for an arbitrary (bounded) 
observable Os on S, the difference in expectation value 
between a randomly chosen state ps = Ti e {,\<P){'P\) and 



B. Similarity of expectation values for a complete 
operator basis 

Here we consider a complete basis of operators for the 
system. Rather than Hermitian operators, we find it con- 
venient to consider a basis of unitary operators Ug. We 

show that with high probability all of these operators 
will have (complex) expectation values close to those of 
the canonical state. 

It is always possible to define rf| unitary operators Ug 
on the system, labelled by a; e {0, l,...(i| — 1}, such 
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that these operators form a complete orthogonal operator 
basis for TLg satisfying [14] 



Tv{Ul'^Ul) = ds6:,y, 



(66) 



where 5xy is the Kronecker delta function. One possible 
choice of (7| is given by 



Hence using the relation between the trace-norm and 
Hilbert-Schmidt-norm (proved in appendix A), 

\\ps-^s\\i<Vd^\\ps-^s\\2<dse. (74) 
Incorporating this result into equation (72) yields 



ds - 1 



VYoh[\\ps-^s\\i>dse] <2dle 



U's= e^™(^-(^'"°<^'^^»/'^-|(s + a;)modd5)(s|. (67) ifwechoose 



s=0 

Noting that ||f7^|| = 1 Va; (due to unitarity), we can then 
apply equation (64) to Os = U§ to obtain 

Prob[|Tr([7|/95) - Tr([7^f2s)| > e] < 2e-^'^«'' Vx. 

(68) 

Furthermore, as there are only d| possible values of x, 
this implies that 

Prob[3x : |Tr(C/|ps) - Tr(C/|17s)| > e] < 24e-^'^«^' 

(69) 

— 1/3 

If we take e — dj^ ^1, note that as the right hand 
side of (69) will be dominated by the exponential decay 



1/3 



e ^"'R , it is very likely that all operators Ug will have 
expectation values close to their canonical values. 



C. Obtaining a probabilistic bound on \\ps — ^s\\i 



1/3 



where 



1/3 



(75) 



(76) 



ds_ 
dR 

we obtain the final result that 



Prob[||ps-f2s|ii >^]< 2dle-^0. (77) 



(78) 



Note that \\ps — ^s\\i ~ with high probability when- 
ever /3 ^ log2{ds) ^ 1, and hence when dn^ dg. This 
result is qualitatively similar to the result obtained using 
the previous method, although it can be shown that the 
bound obtained is actually slightly weaker in this case. 



As the Ug form a complete basis, we can expand any 
state ps as 



where 



C,{p) = TriUp ps)^Tr{Ulpsy 



(70) 



(71) 



Expressing equation (69) in these terms we obtain 
Prob[3x : |C^(p) - C^(fi)| > e] < 24e-^'^«^' (72) 

When \Cxips) — Cx{^s)\ < e for all x, an upper bound 
can be obtained for the squared Hilbert-Schmidt norm 
[15] as follows: 



\\PA - ^a\ 



ds 



= ^^TT^{Cx{ps)-Cxins))U^^ 



< dse' 



(73) 



VII. EXAMPLE: SPIN CHAIN 
WITH np EXCITATIONS 

As a concrete example of the above formalism, con- 
sider a chain of n spin-1/2 systems in an external mag- 
netic field in the +z direction, where the first k spins 
form the system, and the remaining n — k spins form the 
environment. We therefore consider a Hamiltonian of the 
form 



H = 



(79) 



where S is a constant energy (proportional to the exter- 
nal field strength), and ai^^ is a Pauli spin operator for 
the i^^ spin. 

Under these circumstances, the global energy eigen- 
states can be divided into orthogonal subspaces depen- 
dent on the total number of spins aligned with the field. 
We consider a restriction to one of these degenerate sub- 
spaces Hr S Hs^'He in which np spins are in the excited 
state 1 1) (opposite to the field) and the remaining n{l—p) 
spins are in the ground state |0) (aligned with the field). 

With this setup, ds = 2*^ and 



dR 



n 
np 



(80) 
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Approximating this binomial coefficient by an exponen- 
tial (as in Appendix C), gives 



n+1 



(81) 



where -ff(p) = —p\og2{p)—{l—p)log2{l—p) (the Shannon 
entropy of a single spin). 
From Theorem 1, 



Prob[||ps-J^s||i >??] < V, 



where 



V' 




In addition, 




^ < y(^H^2-("^W-2'=)/2. 
dR 



(82) 

(83) 
(84) 

(85) 



For an appropriate choice of e (e.g. e = dj^ <C 1), wc 
will obtain \\ps — ^s\\i ~ with high probability when- 
ever 



\/(;r+T)2-("^(f)-2'=)/2«i 



(86) 



For fixed p, this condition will be satisfied for all suffi- 
ciently large k. 

We emphasise that our results concern the distance 
between ps and fig. Computing the precise form of fls 
is a standard exercise in statistical mechanics, which we 
sketch here for completeness. 

In the regime where n ^ k^, the canonical state fls 
will take the approximate form 



E 



(n-fc)! 



ds{np - \s\)\{n{l ~ p) - {k - \s\))\ 



\s}{s\ 



^ n!(np)l"l(n(l -p))'=-l«l 
~ ^ dsn'^{np)l{n{l-p))\ 

= ^pl^l(l-p)'=-l^l|s)(s| 

S 

= (p|l)(l| + (l-p)|0)(0|)«'=. 



(87) 
(88) 



and hence the canonical state of the system will approxi- 
mate that of k uncorrelated spins, each with a probability 
p of being excited, as expected. 

To connect our result to the standard statistical me- 
chanical formula, 



fls oc cxp 



we use Boltzmann's formula relating the entropy of the 
environment S'£;(|e|) to the number of states A^£;(|e|) of 



the environment with a given number of excitations |e| 

to get 



ks IniV^del) 
n — k 



SE{\e\) 

= ksln 

w ks ^(n — k) ln(n — fc) — |e| In |e| 

-(n- A:- |e|)ln(n-fc- |e|)), (90) 

where in the third line we have used Stirling's approxi- 
mation. Defining the temperature in the usual way, and 
noting that the energy of the environment is given by 
E = \e\B - (n - k)B/2, we obtain 



1 
T 



dE 
1 dSEi\e 



E=(E) 



|e| — (n — 



B d|e| 



kg / n-k-\e\ 
B ^\ lei 



|e|=(n— fc)p 



B \ p 



(91) 



This formula expresses how the probability p defines 
a temperature T of the environment. Rearranging equa- 
tion (87) to incorporate equation (91) gives the usual 
statistical mechanical result 



ki 



p 



= (l-p)'=^exp(-|s|ln 

S ^ 

= (l-,)'=^exp(-M|) |,)(,| 



\s){s\ 



oc exp 



Hs \ 
kBTj " 



(92) 



A. Projection on the typical subspace 

We can obtain an improved bound on — JlsHj^ by 
noting that the system state almost always lies in a typ- 
ical subspace with approximately kp excitations. We 
make use of this observation by applying Theorem 2 with 
a measurement operator Xr given by 



(93) 



where Us is a projector onto the typical subspace of the 
system, in which it contains a number of excitations \s\ 
in the range 



kp-^<\s\< kp + £,. 



(94) 
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It is easy to show, using classical probabilistic argu- 
ments (see Appendix B), that 



TrR{XR£R) = TvsiHsns) >l-6 



where 



6 = 2 exp 



e 



4kp{l -p) 



(95) 



(96) 



Furthermore, the dimension ds of the support of Xr 
on Tig (which here is simply the dimension of the typical 
subspace) is shown in appendix C to be given by 



kp+i 

ds= E 

< (2^ + 1) 2'=^(f)+«^(P) 



where 



G{P) 



dH{p) 



dp 



log2 



1 



From Theorem 2 we obtain: 

Prob[||ps-fis||i > »?] < j?', 



(97) 



(98) 



(99) 



where, using d"^ > da/ds, and inserting the results of 
equations (81), (96) and (97), 



e + ^/{n+^){2^, + 1) 2('=-"/2)^^(p)+«G'(p)(ioo) 



+V32exp 



8kp{l-p) 



fj' = 2exp(-Cdfle2) . 
Choosing ^ = fc^/^ and e = djj^^^ yields 



(101) 



(n + 1)1/3 2-"«(p)/3 

+ ^(^^^(2^2/3 + 1) 2-("-2'=)^(P)/2+'='^'G(p) 

^1/3 



+V32exp 



'8p{l-p) 



V 



2 exp 



(j2^ii{p)lz 
'(n+ 1)1/3 



(102) 
(103) 



In the thermodynamic limit in which p is fixed (cor- 
responding to the temperature), the ratio of the system 
and environment sizes r = k/{n — k) is fixed at some 
value r < 1 (i.e. the system is smaller than the environ- 
ment), and n tends to infinity, ?7 ^ and r]' 0, and 
hence ps — > fig. 

For large (but finite) n the system will be thermalised 
for almost all states when the system is smaller than the 
environment (i.e. r < 1). Note that as t] depends expo- 
nentially on (n — 2k), r] <^ 1 can be achieved with only 
small differences in the number of spins in the system 
and environment. 



VIII. CONCLUSIONS 

Let us look back at what we have done. Concern- 
ing the problem of thcrmalisation of a system interacting 
with an environment in statistical mechanics, there are 
several standard approaches. One way of looking at it 
is to say that the only thing we know about the state of 
the universe is a global constraint such as its total energy. 
Thus the way to proceed is to take a Bayesian point of 
view and consider all states consistent with this global 
constraint to be equally probable. The average over all 
these states indeed leads to the state of any small subsys- 
tem being canonical. But the question then arises what 
is the meaning of this average, when we deal with just 
one state. Also, these probabilities arc subjective, and 
this raises the problem of how to argue for an objective 
meaning of the entropy. A formal way out is that sug- 
gested by Gibbs, to consider an ensemble of universes, 
but of course this doesn't solve the puzzle, because there 
is usually only one actual universe. Alternatively, it was 
suggested that the state of the universe, as it evolves in 
time, can reach any of the states that are consistent with 
the global constraint. Thus if we look at time averages, 
they are the same as the average that results from consid- 
ering each state of the universe to be equally probable. 
To make sense of this image one needs assumptions of 
ergodicity, to ensure that the universe explores all the 
available space equally, and of course this doesn't solve 
the problem of what the state of the subsystem is at a 
given time. 

What we showed here is that these averages are not 
necessary. Rather, (almost) any individual state of the 
universe is such that any sufficiently small subsystem be- 
haves as if the universe were in the equiprobable aver- 
age state. This is due to massive entanglement between 
the subsystem and the rest of the universe, which is a 
generic feature of the vast majority of states. To obtain 
this result, we have have introduced measures of the ef- 
fective size of the system, ds, and its environment (i.e. 
the rest of the universe), d^ , and showed that the aver- 
age distance between the individual reduced states and 
the canonical state is directly related to dg/d^ . Levy's 
Lemma is then invoked to conclude that all but an ex- 
ponentially small fraction of all states are close to the 
canonical state. 

In conclusion, the main message of our paper is that 
averages are not needed in order to justify the canonical 
state of a system in contact with the rest of the universe 
- almost any individual state of the universe is enough 
to lead to the canonical state. In effect, we propose to 
replace the Postulate of Equal a priori Probabilities by 
the Principle of Apparently Equal a priori Probabilities, 
which states that as far as the system is concerned every 
single state of the universe seems similar to the average. 

We stress once more that we are concerned only with 
the distance between the state of the system and the 
canonical state, and not with the precise mathematical 
form of this canonical state. Indeed, it is an advantage 
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of our method that these two issues are completely sep- 
arated. For example, our result is independent of the 
canonical state having Boltzmannian form, of degenera- 
cies of energy levels, of interaction strength, or of energy 
(of system, environment or the universe) at all. 

In future work [17], we will go beyond the kinematic 
viewpoint presented here to address the dynamics of ther- 
malisation. In particular, we will investigate under what 
conditions the state of the universe will evolve into (and 
spend almost all of its later time in) the large region of 
its Hilbert space in which its subsystems are thermalised. 



Lemma 6 For any nx n matrix M, \\M\\^ < ^/n\\M\\2. 
Proof: If M has eigenvalues Aj, 

i 

by the convexity of the square function. Taking the 
square-root yields the desired result. □ 
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APPENDIX A: LIPSCHITZ CONSTANTS 
AND NORM RELATION 

Lemma 4 The Lipschitz constant rj of the function 
f{4>) = \\ps-^s\\i, satisfies ?7 < 2. 

Proof: Defining the reduced states pi = Tte{\4'i){4'i\) 
and p2 = TrE{\4'2){<l>2\), and using the result that partial 
tracing cannot increase the trace-norm 

\f{h)-f{h)f = \\\pi-n\\,-\\P2-n\\,f 



< |||0l)(</'l|- 1^-^2111? 
= 4(l-|(.^i|.^2)l') 

< 4||</,i)- 102)1' 



(Al) 



Hence |/((/.i) - /(.^s)! < 2 ]|.^i) - \<l>2)\, and thus 77 < 2.0 

Lemma 5 The Lipschitz constant rj of the function 
f{4>) = TT{X\(f)){(j)\), where X is any operator on T-Lr 
with finite operator norm \\X\\ satisfies 77 < 2 

Proof: 

l/('^l)-/(<^2)| = \{ct>l\X\<P^) - {MX\cj>2)\ 



< ll^ll 

< 2IIA 



((/.2|)X(|0i)-|02)) 

((0l|-(02|)X(|</'l) + |02))| 
||0l) + |02)|||0l)- 102)1 

l)-|</.2)|. □ (A2) 



APPENDIX B: PROJECTION ONTO 
THE TYPICAL SUB SPACE 

Lemma 7 Given a system in the canonical state ^s, the 
probability of it containing a number of excitations \s\ in 
the range kp — ^ < \s\ < kp + is given by 



TV(nsOs) >1-S 



where 



6 = 2 exp 



e 



4:kp{l -p) 



(Bl) 



(B2) 



Proof: fig is essentially a classical probabilistic state, 
obtained by choosing k spins at random from a 'bag' 
containing up excited spins and n{l—p) un-excited spins 
without replacement. It is easy to see that this state will 
lie in the typical subspace with higher probability than if 
the spins were replaced in the bag after each selection, as 
the former process is mean reverting, whereas the latter 
is not. We can bound the probability of lying outside 
the typical subspace in the case with replacement using 
Chernoff's inequality [16] for the sum X = Yliii^i ~ P)^ 
where Sj € {0, 1} is the value of the i^^ spin. This gives 



Prob 



1^1 >e 



< 2e" 



where = kp{l — p) is the variance of X. Hence 



Prob ||s| -kp\> £, <2e wrr^ 



(B3) 



(B4) 



and thus 



Tr(nsr2s) = 1-Prob ||s| - fcpj > ^ 



> 1 — 2e *'=p(i-p) □ 



(B5) 



APPENDIX C: EXPONENTIAL BOUNDS 
ON COMBINATORIAL QUANTITIES 

In this appendix we obtain bounds for the combina- 
toric quantities required to consider the example case of 
a spin-chain. 
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From standard probability theory we know that 



fc=o 



(Cl) 



with the maximal term in the sum being obtained when 
k = np. Hence 



P 



np 

Noting that 



(C2) 



(C3) 



where H{p) = — plog2(p) — (1 — p)log2(l — p), we can 
rearrange equation (C2) to get 

2nH(p)-log2(n+l) < ^ ^ < 2^^^P\ (C4) 

We also require an upper bound for the dimension of 
the typical subspace of system S, given by 



(C5) 



|s|=fcp-^ 

The maximal term in this sum occurs when \s\ = kp 



where 



p + C/k 
1 

l-^/k 



p<\-i/k 

\p-\\<^lk 
P>\ + ^/k, 



(C6) 



and as the sum consists of (2^ + 1) terms, 



ds < (2^ + 1) 



\ kp 



(C7) 



Bounding the binomial coefficient by an exponential as 
above we obtain 

rfs < (2C + l)2'^-^(^^ (C8) 
As H{p) is a concave function of p, we also note that 

dH{p) 



kH{p) - kH{p) < ^ 



dp 



Defining 



G{P) 



dH{p) 



dp 



log2 



l-p 



we therefore find that 

ds < (2^+l)2'=^(f)+«^(p). 



(C9) 



(CIO) 



(Cll) 
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